Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation

نویسندگان

  • Giorgio Fumera
  • Fabio Roli
  • Alessandra Serrau
چکیده

In this paper the performance of bagging in classification problems is theoretically analysed, using a framework developed in works by Tumer and Ghosh and extended by the authors. A bias-variance decomposition is derived, which relates the expected misclassification probability attained by linearly combining classifiers trained on N bootstrap replicates of a fixed training set to that attained by a single bootstrap replicate of the same training set. Theoretical results show that the expected misclassification probability of bagging has the same bias component as a single bootstrap replicate, while the variance component is reduced by a factor N . Experimental results show that the performance of bagging as a function of the number of bootstrap replicates follows quite well our theoretical prediction. It is finally shown that theoretical results derived for bagging also apply to other methods for constructing multiple classifiers based on randomisation, such as the random subspace method and tree randomisation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارتقای کیفیت دسته‌بندی متون با استفاده از کمیته‌ دسته‌بند دو سطحی

Nowadays, the automated text classification has witnessed special importance due to the increasing availability of documents in digital form and ensuing need to organize them. Although this problem is in the Information Retrieval (IR) field, the dominant approach is based on machine learning techniques. Approaches based on classifier committees have shown a better performance than the others. I...

متن کامل

Combining Bias and Variance Reduction Techniques for Regression Trees

Gradient Boosting and bagging applied to regressors can reduce the error due to bias and variance respectively. Alternatively, Stochastic Gradient Boosting (SGB) and Iterated Bagging (IB) attempt to simultaneously reduce the contribution of both bias and variance to error. We provide an extensive empirical analysis of these methods, along with two alternate bias-variance reduction approaches — ...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Using Data Mining Models for Differential Diagnosis of Iron Deficiency Anemia and β-thalassemia Minor

Introduction: One of the most common types of anemia is Iron deficiency anemia that its main differential diagnosis is β-thalassemia minor. The rapid and accurate screening of β-thalassemia minor has particular importance for pre-marriage medical counseling and the prevention of the birth of neonates with β-thalassemia major and differentiating it from iron deficiency anemia to avoid unnecessar...

متن کامل

Using Data Mining Models for Differential Diagnosis of Iron Deficiency Anemia and β-thalassemia Minor

Introduction: One of the most common types of anemia is Iron deficiency anemia that its main differential diagnosis is β-thalassemia minor. The rapid and accurate screening of β-thalassemia minor has particular importance for pre-marriage medical counseling and the prevention of the birth of neonates with β-thalassemia major and differentiating it from iron deficiency anemia to avoid unnecessar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005